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We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source 
software package for applying and combining modern methods of data analysis and modeling from complex 
network theory and nonlinear time series analysis, pyunicorn is a fully object-oriented and easily paral- 
lelizable package written in the language Python. It allows for the construction of functional networks such 
as climate networks in climatology or functional brain networks in neuroscience representing the structure 
of statistical interrelationships in large data sets of time series and, subsequently, investigating this struc¬ 
ture using advanced methods of complex network theory such as measures and models for spatial networks, 
networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn 
provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time 
series from a non-traditional perspective by means of recurrence quantification analysis (RQA), recurrence 
networks, visibility graphs and construction of surrogate time series. The range of possible applications of 
the library is outlined, drawing on several examples mainly from the field of climatology. 


Network theory and nonlinear time series analysis 
provide powerful tools for the study of complex 
systems in various disciplines such as climatology, 
neuroscience, social science, infrastructure or eco¬ 
nomics. In the last years, combining both frame¬ 
works has yielded a wealth of new approaches for 
understanding and modeling the structure and 
dynamics of such systems based on the statistical 
analysis of network or uni- and multivariate time 
series. The pyunicorn software package (available 
at https://github.com/pik-copan/pyunicorn) facil¬ 
itates the innovative synthesis of methods from 
network theory and nonlinear time series analy¬ 
sis in order to develop novel integrated method¬ 
ologies. This paper provides an overview of the 
functionality provided by pyunicorn, introduces 
the theoretical concepts behind it and provides 
examples in the form of selected use cases mainly 
in the fields of climatology and paleoclimatology. 


Electronic mail: donges@pik-potsdam.de 


I. INTRODUCTION 

Complex network theor}^!^^ and nonlinear time series 
analysis^^ provide two complementary perspectives on 
the structure and dynamics of complex systems. His¬ 
torically, the investigation of complex networks has fo¬ 
cussed on the structure of interactions (links or edges) 
between the possibly large number of subsystems (nodes 
or vertices) of a complex system, e.g. searching for uni¬ 
versal properties like scaling behavior or identifying spe¬ 
cific classes of nodes such as bottlenecks that are particu¬ 
larly important transmitters for flows on the network. In 
contrast, nonlinear time series analysis emphasized dy¬ 
namical aspects such as predictability, chaos, dynamical 
transitions or bifurcations in the observed or modeled 
time-dependent state variables of complex systems. For 
a long time, these communities were mostly disconnected 
and, particularly, applied distinct software tools such as 
igraplP or networkxPSl for analyzing complex networks 
and the classical TI SEAN package for nonlinear time series 
analysi^. 

In the last several years, two strands of research have 
been taken advantage of the synergies obtained by com¬ 
bining complex network theory and nonlinear time series 
analysis. On the one hand, the analysis of functional 
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networks put forward in neuroscienc^I^®^ and climatol- 
og^ffHEo] Q^g application areas such as eco¬ 

nomics and financ^^, applies methods from linear and 
nonlinear time series analysis to construct networks of 
statistical interrelationships among a set of time series 
and, subsequently, studies the resulting functional net¬ 
works by means of methods from complex network the¬ 
ory. On the other hand, network-based time series analy¬ 
sis investigates the dynamical properties of complex sys¬ 
tems’ states based on uni- or multivariate time series data 
using methods from network theor^EU. Various types of 
time series networks have been proposed for perform¬ 
ing this type of analysis, including recurrence networks 
based on the recurrence properties of phase space trajec- 
torie^^^HUl^ transition networks encoding transition prob¬ 
abilities between different phase space region^^ and visi¬ 
bility graphs representing visibility relationships between 
data points in a time serieJ^^JEHl^ 


The purpose of this paper is to introduce the Python 
software package pyunicorn, which implements methods 
from both complex network theory and nonlinear time 
series analysis, and unites these approaches in a per for¬ 
mant, modular and flexible way. Thereby, pyunicorn al¬ 
lows to easily apply recently developed techniques com¬ 
bining these perspectives, such as functional networks 
and network-based time series analysis. Furthermore, the 
software allows to conveniently generate new syntheses of 
existing concepts and methods from both fields that can 
lead to novel methodological developments and fruitful 
applications in the future. While in this tutorial paper, 
the work flow of using pyunicorn is mainly illustrated 
drawing upon examples from climatology, the package is 
applicable to all fields of study where the analysis of (big) 
time series data is of interest, e.g. neuroscienc^I^EIEl 
In this paper, while we aim to give a practical overview 
on the functionality and possibilities of pyunicorn, we 
cannot provide a comprehensive reference or handbook 
due to space constraints. For such a reference, see the 
pyunicorn API documentatiorP^ (see pyunicorn web¬ 
site for newest version). 


This article is structured as follows: After a general 
introduction of pyunicorn and a discussion of the philos¬ 
ophy behind its implementation, software structure and 
related computational issues (Sect.|I|, pyunicorn’s capa¬ 
bilities for analyzing and modeling complex networks are 
described including general networks, spatial networks, 
networks of interacting networks or multiplex networks 
and node-weighted networks (Sec. 0 - Building on this. 
Sect. m presents methods for constructing and analyz¬ 
ing functional networks from fields of multiple time se¬ 
ries, including use cases demonstrating the application 
of climate network and coupled climate network analy¬ 
sis. Section IV describes pyunicorn’s methods for per¬ 
forming nonlinear time series analysis using recurrence 
plots, recurrence networks and visibility graphs. Meth¬ 
ods for generating surrogate time series, which are useful 
for both functional network and network-based time se¬ 
ries analysis, are introduced in Sect. |V| Finally, conclu¬ 


sions are drawn and some perspectives for future further 
development of pyunicorn are outlined (Sect. [VT| ). 


A. Implementation philosophy 


pyunicorn is intended as an integrated container for a 
host of conceptionally related methods which have been 
developed and applied by the involved research groups 
for many years. Its aim is to establish a shared infras¬ 
tructure for scientific data analysis by means of complex 
networks and non-linear time series analysis and it has 
greatly taken advantage from the backflow contributed 
by users all over the world. The code base has been fully 
open sourced under the BSD 3-Clause license. 

With a focus on complex network methods, this soft¬ 
ware is a valuable complement to traditional non-linear 
time series analysis tools such as tiseadP. Its main 
mode of operation is to import, generate and export com¬ 
plex networks from time series or fields thereof, and to 
compute appropriate measures on these networks in order 
to derive insights into the causal structure and dynamical 
regimes of underlying processes. While pyunicorn’s de¬ 
velopment has mostly accompanied advances in climatol¬ 
ogy and paleoclimatology, the generality of the network 
approach and its implementation of extensions to stan¬ 
dard complex networks like spatio-temporal networks, 
node weighted measures, coupled functional networks 
and recurrence networks render the software widely ap¬ 
plicable in numerous fields, e.g. medicine, neuroscience, 
sociology, economics and finance. Great care has been 
taken in linking to relevant publications from the method 
descriptions contained in the code and API documenta- 
tioiPI 

As the name suggests, the language chosen for the im¬ 
plementation is Pytho n, wh ich is very well established 
in scientific computin^^^H^ to the multiplicity of 

useful combinations of methods, there are no executa¬ 
bles in pyunicorn, but the library is intended to be used 
by small Python scripts. Its object-oriented software ar¬ 
chitecture allows for clean and flexible code representing 
the logical interrelationships and depen denc ies between 
the various concepts and methods (Sect. For exam¬ 
ple, the class RecurrenceNetwork (Sect. IVA2) inherits 
from both the Networ k (Sect. [ll^ and RecurrencePlot 
classes (Sect. |!vaT] ), thus reflecting the mathematical 
definition and historical development of recurrence net¬ 
work analysis (Fig. IT)- Following a similar reasoning be¬ 
hind the implementation of a class hierarchy, the c limate 
network class MutualinfoClimateNetwork (Sect. IniBl ) 
inherits from the Network class via the intermediate par¬ 
ent cl asses G eoNetwork (Sect. [ll^ ) and ClimateNetwork 
(Sect. |IIIB ), additionally including several object com¬ 
position relationships on the way (Fig. IT)- 

While ensuring accessibility and maintainability 
among scientists in the disciplines mentioned above, this 
design facilitates fully flexible use of the package, from 
interactive local sessions in IPythoiP^ to massively par- 
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FIG. 1. Examples for the software architecture of pyunicorn displayed as a Unih ed Modeling Language (UML) diagram of class 
relationships: ance stry of the (A) timeseries .RecurrenceNetwork (Sect. [TV A 2| and (B) climate .MutualInfoClimateNetwork 
classes (Sect. [m^ . Inheritance (class B inherits from class A, solid arrows) and object composition relationships (class B contains 
class A, dashed arrows) are indicated. 


allel computations on cluster architectures. For several 
years now, pyunicorn has been successfully deployed on 
Linux, Mac OS X and Windows systems as well as UNIX 
high performance clusters. 

Besides Nump^/I^ and Scip^/^^, which are among the 
most widely spread libraries for scientific computing 
in Python, pyunicorn’s only hard dependency is the 
igraph network analysis packag^. pyunicorn does not 
possess its own graphical interface, but where visual out¬ 
put is meaningful, helper methods exist for plotting with 
matplotlitP^, which is especially convenient in IPython. 
Interfaces to tools for advanced network vi sualiz ation fo¬ 
cussing on spatial networkJ^ such as CG#iE2l 

are pro¬ 
vided via pyunicorn’s input-output capabilities. Com¬ 
mented examples for typical use cases are provided by 
the extensive software documentation. 


B. Software structure 

The pyunicorn library is fully object-oriented and its 
inheritance and composition hierarchy reflects the rela¬ 
tionships between the analysis methods in use (Fig. [^. 
It consists of five subpackages (Tab.[^: 

core: This name space contains the basic building 
blocks for general network analysis and modeling 
and is accessible after calling import pyunicorn 
(Sect. 0 - The backbone Network class provides 
numerous standard and advanced complex network 
statistics, measures and generative models as well 
as import and export capabilities to GraphML, 
GML, NCOL, LGL, DOT, DIMACS and other for¬ 
mats. Grid and GeoNetwork extend these func¬ 
tionalities with respect to spatio-temporally em¬ 


bedded networks, which can be imported from and 
exported to ASCII and NetCDF files via the Data 
class. InteractingNetworks provides advanced 
methods designed for networks of networks (or mul¬ 
tiplex networks), while ResNetwork specializes in 
power grids transporting electric currents and re¬ 
lated infrastructure networks. 


funcnet: Advanced tools for construction and analysis 
of general (non-climate) functional networks will 
be accommodated here. So far, CouplingAnalysis 
calculates cross-correlation, mutual information, 
mutual sorting information and their respective 
surrogates for large arrays of scalar time series 
(Sect.lm). 


climate: Building on top of GeoNetwork and Data, the 
ClimateNetwork class and its children facilitate 
the construction and analysis of functional net¬ 
works representing the statistical interdependency 
structure within a field of time series, based on 
similarity measures such as lagged linear Pearson 
or Sp earma n correlation and mutual information 
(Sect. [nr^ ). CoupledClimateNetwork extends this 
capability to the study of interrelationshi ps be - 
tween two distinct fields of observables (Sect. |nT^ . 


timeseries: This subpackage provides various tools for 
the analysis of non-linear dynamical systems and 
uni- and multivariate time series (Sect. IV). Apart 
from visibility graphs with time-directed mea¬ 
sures (VisibilityGraph class), the focus lies 
on recurrence-based methods derived from the 
RecurrencePlot class. These include single, joint 
and cross-recurrence plots as well as standard, joint 
and inter-system recurrence networks, supporting 
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TABLE L Structure of the pyunicorn software package listing the most important classes belonging to each submodule (selection 
for brevity). 


core 

funcnet 

climate 


timeseries 

utils 

(Sect. 1^ 

(Sect, nil 

(Sects. 

iiiB me 

) 

(Sects. IV q 

(Sect. B 


Network CouplingAnalysis ClimateNetwork 

GeoNetwork CouplingAnalysisPurePython CoupledClimateNetwork 

InteractingNetworks TsonisClimateNetwork 

ResNetwork SpearmanClimateNetwork 

Data 
Grid 


RecurrencePlot 
CrossRecurrencePlot 
JointRecurrencePlot 
RecurrenceNetwork 
MutualinfoClimateNetwork InterSystemRecurrenceNetwork 
ClimateData JointRecurrenceNetwork 

VisibilityGraph 

Surrogates 


mpi 

navigator 


time-delay embedding and several phase space met¬ 
rics and are equipped with common measures of re¬ 
currence quantification analysis. Surrogates allow 
testing for the statistical significance of similarity 
measures by generating surrogate data sets under 
miscellaneous constraints from observable time se¬ 
ries (Sect. [V|). 

utils: Currently this includes MPI parallelization support 
and an experimental interactive network navigator. 


C. General computational issues 

Most network measures are defined as aggregates of 
local information obtained from topology, node weights 
and link attributes. Since pyunicorn internally repre¬ 
sents networks as sparse adjacency matrices, it can han¬ 
dle large data sets. Until streaming algorithms are imple¬ 
mented for all measures, the available amount of working 
memory (RAM) limits the size of networks which can be 
processed. Presently, many advanced methods are de¬ 
fined only for undirected networks. As sensible gener¬ 
alizations and applications come up, they are gradually 
incorporated into the code base. 

As is usually the case with Python libraries, pyunicorn 
is designed to provide simple interfaces and clear archi¬ 
tecture, while delegating the heavy lifting to specialized 
tools for performance. Basic network measures and gen¬ 
erative models are inherited from igraph. Wherever pos¬ 
sible, numerically intensive computations are expressed 
as combinations of highly optimized linear algebra meth¬ 
ods from Numpy and Scipy, and otherwise implemented 
in embedded CythoiPl code. Thus all costly computa¬ 
tions are performed in compiled C, C+-1- or FORTRAN 
code. Parallelization is mostly not implemented on the 
algorithm level, but can be achieved using the built-in 
MPI helper for repetitive tasks, e.g. computing measures 
on recurrence networks for different time windows of an 
observable. As the required RAM size is mostly depen¬ 


dent on the volume of data to be analyzed, a modern 
laptop processor with a single core suffices to perform 
most of the computations described later on for currently 
typical data sets in a matter of seconds to an hour. As 
an example, the recurrence network displayed in Fig. [Tb] 
takes approximately 0.03 seconds to compute on a dual¬ 
core Intel Core i5 CPU with 2.4 GHz running Mac OS 
X. For illustration, a more systematic study of the per¬ 
formance of recurrence network construction, a common 
task of using pyunicorn, is displayed in Fig. GPU 
computations are currently not supported. 



FIG. 2. Dependence of computation time for constructing 
recurrence networks (represented by RecurrenceNetwork ob¬ 
jects) of varying size N constructed from state vectors ran¬ 
domly drawn from the Lorenz attractor (see Fig. |15| ) on a 
dual-core Intel Core i5 CPU with 2.4 GHz running Mac OS 
X (star symbols). For larger N, computation time is propor¬ 
tional to (solid line) as expected for the straightforward 
algorithm implemented for calculating the N x N recurrence 
matrix R. Note that fast neighbor search algorithms, cur¬ 
rently not implemented in pyunicorn, can reduce the number 
of required computations to 0{N log N) or even 0{N) under 
certain condition^^. 
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II. COMPLEX NETWORK ANALYSIS 


pyunicorn provides methods for analyzing and mod¬ 
eling various types of com plex networks, includin g gen - 
eral networks (Sect. Ea), spatial netw orks ( Sect. |IIB[ ), 
networks of interacting networks (Sect. EQ and node- 
weighted networks (Sect. [H^ ). In the following, the cor¬ 
responding classes and methods are described together 
with a brief introduction of the underlying theory. Se¬ 
lected use cases illustrate the associated functionality of 
pyunicorn. 


A. General complex networks 


The class Network in the submodule core serves as a 
parent to all other network-related classes in pyunicorn 
(see, e.g. Fig. Q and represents general undirected and 
directed networks or graphs G = iV^E) consisting of a 
set of nodes V = {!,..., N} and a set of (directed) links 
ECV X V without dublicates. Networks of this type can 
be described by an adjaeeney matrix A with elements 


A 


pq — 


{p, q) eE 
otherwise. 


(1) 


Hence, Apq = 1 iff nodes p and q are connected by 
a (directed) link and Apq = 0 iff they are uncon¬ 
nected. In pyunicorn, instances of the Network class 
can be initialized using such dense adjacency matri¬ 
ces, but also based on sparse mat rices or link lists. 
Link and node weights (see Sect. IID) can be rep¬ 
resented by link and node attributes and are acces¬ 
sible through the Network. set_link_attribute and 
Network. set_node_attribute methods, respectively. 


Many standard complex network measures, network 
models and algorithms are supported, e.g. degree, close¬ 
ness and shortest-path betweenness centralities, clus¬ 
tering coefficients and transitivity, community detec¬ 
tion algorithms and network models such as Erdds- 
Renyi, Barabasi-Albert or configuration model random 
networkJ^. Several of these measures provided by 
pyunicorn can take into account directed links and link 
weights (directed and weighted networks) if this infor¬ 
mation is present. However, the remainder of this article 
focusses on undirected networks, reflecting the current 
state of the pyunicorn implementation. 

Moreover, a number of less common network statis¬ 
tics such as Newman ’sP or Arenasrandom walk be¬ 
tweenness can be computed. Reading and saving net¬ 
work data from and to many common data formats such 
as GraphMlP^ is possible for storage and information 
exchange with other software for network analysis and 
visualizatiorP^ such as networkxP^ or gephi^^. 
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FIG. 3. Effects of spatial embedding of nodes on network 
structure: (A) Probability for links in the US interstate 
network to be also present under the application of Geo- 
Model 1 computed over an ensemble of 100 surrogate net¬ 
works using GeoNetwork.randomly_rewire_geomodel_I. (B) 
The same under the application of GeoModel2 provided by 
GeoNetwork. randomly _rewire_geomodel_II. 


B. Spatially embedded networks 

Many, if not mo^ complex networks of interest are 
spatially embeddecP^. Gonsider, for example, social 
networks, infrastructure networks such as the internet, 
road and other transportation networks (Eig. or 
functional networks in neuroscience and climatology 
(Fig. [^. pyunicorn includes measures and models 
specifically designed for spatially embedded networks 
(or simply spatial networks) via the GeoNetwork class 
that inherits from the Network class (Eig. [^). Ghar- 
acteristics of the nodes’ spatial embedding, such as 
all longitudinal and latitudinal coordinates, are stored 
in the Grid class. In particular, this class then pro¬ 
vides methods for computing and evaluating spatial 
distances between all pairs of nodes via the methods 
Grid. angular _distance. Grid. euclidean_distance 
and Grid.geometric_distance_distribution. Addi¬ 
tionally, functionality for loading and saving the grid 
from and to common file formats such as ASGH is 
provided. 


1. Measures and models for spatial networks 

Measurements on the network’s spatial embedding 
are performed by using the class GeoNetwork which is 
initialized with an existing instance of the Grid class 
(Fig. [^). Generally, all standard network measures. 
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like the degree or clustering coefficient, can be com¬ 
puted in an area-weighted variant taking into account 
the network’s spatial embedding and, hence, avoiding 
biases caused by the potentially widely different sur¬ 
face areas or volumes that nodes may represent (see 
Sec. IID for details). In addition, the distribution 
of the links’ spatial lengths are evaluated using the 
method GeoNetwork. link_distance_distribution. 
For each node in the network the lengths of 
its emerging links can be assessed via the 
methods GeoNetwork. average_link_distance, 

GeoNetwork. total_link_distance and 

GeoNetwork. max_link_distance which all give a 
notion of the spatial distance between a specific node 
and its neighbors. In the application to climate sci¬ 
ences, where links in the network typically represent 
interdependencies of statistical significance between 
climate observables taken at different locations on the 
Earth’s surface, the above mentioned measures are 
of crucial importance when in vestigati ng the presence 
of long-ranging teleconnectionJ^^^^^^^ in the climate 
network (Sect. IIIB). 

In addition to the evaluation of a spatially embed¬ 
ded network’s topological structure, the GeoNetwork 
class also provides random network models to con¬ 
struct spatially embedded networks under the same spa¬ 
tial constraints, i.e. with the same spatial distribu¬ 
tion of nodes, as the network under study. These 
spatial network surrogates allow to assess to what de¬ 
gree certain properties of an observed network are 
consistent with those expected from a structural null 
model that is encoded in the construction rules for 
the network surrogates. In particular, the method 
GeoNetwork. set_random_links_by_distance constructs 
a random network in which the probability for the pres¬ 
ence of a link between two nodes decays exponentially 
with the geographical distance between them. Fur¬ 
thermore, three different GeoModels are implemented 
in pyunicorn which construct random network sur¬ 
rogates of a given network by iteratively rewiring 
its links under different conditions: (i) GeoModell 
(GeoNetwork.randomly_rewire_geomodel_l) creates a 
random network with the same link-length distri¬ 
bution and link density as the one represented by 
the respective instance of GeoNetwork. (ii) Ge- 
oModel2 (GeoNetwork. randomly_rewire_geomodel_Il) 
additionally preserves the local link-length distri¬ 
bution and degree of each node, and (iii) Geo- 
ModelS (GeoNetwork.randomly_rewire_geomodel_IIl) 
additionally sustains the degree-degree correlations (or 
assortativity) of the original networlP^. 


2. Use case: US interstate network 

We illustrate the application of these random spatial 
network models by constructing 100 surrogate networks 
of the US interstate networlJ^ utilizing GeoModell and 


GeoModel2^ respectively. One way to quantify how well 
the network under study is represented by each of the 
two models is to compute the probability that each of its 
links is also present in the ensemble of random surrogates 
(Fig. 1^. We find that GeoModell already reproduces 
well many of the original links in the US interstate net¬ 
work (Fig. |3}\) , implying that its structure is already well 
determined by its global link length distribution. Addi¬ 
tionally preserving the local link length distributions and 
degree sequence improves the results further: most links 
of the original network are present in the surrogate net¬ 
works with high probability (Fig. |^). 


C. Networks of interacting networks 



FIG. 4. A network of interacting networks consisting of 
three subnetworks Gi that can be represented and analyzed 
using the class core. InteractingNetworks. Nodes are rep¬ 
resented by geometric symbols, while internal and cross-links 
are indicated by solid and dashed lines, respectively. 

The structure of many complex systems can be de- 
scrib ed as a network of interaeting or interdependent net- 
workS^^^^ e.g. the densely entangled i nfrast ructures for 
communication and energy distributiorP^^^. Constitut¬ 
ing a specific but important subclass of multiplex or mul¬ 
tilayer network^^, these networks of networks can be 
represented by decomposing a network G = (U, F) into 
a collection of M subnetworks Gi = (Vi^Ea) (Fig. [^. 
Here, U denote the disjunct sets of nodes corresponding 
to each subnetwork and the internal links sets Ea contain 
information on the connections within a subnetwork such 
that U^i ^' Additionally, disjunct sets of cross¬ 
links Eij connect nodes in different subnetworks with 
Uf^=i ^ij — Alternatively, a network of networks 
(multiplex network) of this type can be represented by 
a standard adjacency matrix A with block structure^. 
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Depending on the problem at hand, the decomposition 
into subnetworks can be given a priori^ as in the example 
of interdependent communication and electricity grids, or 
may be obtained from solutions of a community detection 
algorithm applied to a complex network of interestP^K^. 


1. Measures and models for networks of networks 

The InteractingNetworks class in the submodule 
core provides a rich collection of network mea¬ 
sures and models specifically designed for in vesti- 
gating the structure of networks of network j^^ l ^^ l 
Relevant examples include the cross-link den¬ 
sity of connections between different subnetworks 
(InteractingNetworks. cross_link_density) 


\Ei. 


Pij — 


IViWVji 


( 2 ) 


the cross-degree or number of neighbors of 
node V G Id in a different subnetwork 
(InteractingNetworks. cross_degree) 




qeVj 


( 3 ) 


or the cross-shortest path betweenness 

(InteractingNetworks. cross_betweenness) defined 
for all nodes w eV: 




E 




(w) 


( 4 ) 


peVi,qeVj]p,q^w 


quantifying the importance of nodes for mediating inter¬ 
actions between different subnetworks, where apq denotes 
the total number of shortest paths from p ^Vi to q ^Vj 
and (Tpq{w) counts the number of shortest paths between 
p and q that include w. The InteractingNetworks class 
also contains node-weighted version s of m ost of the pro¬ 
vided statistical measures (see Sect. IID). 

Surrogate models of interacting networks allow the 
researcher to assess the degree of organization in the 
cross-connectivity between subnetworks and its effects 
on other network properties of interest such as (cross-) 
clustering and (cross-) transitivity or shortest-path 
based measures such a s ( cross-) average path length 
or (cross-) betweenness^. Specifically, pyunicorn 
currently supports two types of interacting network 
models that conserve (i) the number of cross-links or 
the cross-link density between a pair of subnetworks 
(InteractingNetworks.RandomlySetCrossLinks), 
analogously to the Erdds-Renyi modeP^ for 
general complex networks, and (ii) the cross¬ 
degree sequences between a pair of subnetworks 
(InteractingNetworks.RandomlyRewireCrossLinks), 
corresponding to the configuration model for standard 
network^. 




FIG. 5. Visualization of the Zachary karate club network 
with node colors indicating the two groups emerging after 
fission that were lead by individuals 0 (subnetwork Gi, red 
nodes) and 33 (subnetwork G 2 , blue nodes), respectively (note 
that the original node numbering of Zachary^ can be ob¬ 
tained by adding 1 to these indices). Node size is scaled ac¬ 
cording to the (A) cross-degree and (B) cross-betweenness 
(log-scale) measures of interacting networks analysis. Inter¬ 
nal links are colored in red and blue, while cross-links are 
displayed in violet. 


In the context of time series analysis, the interact¬ 
ing network representation has been applied for study¬ 
ing the structure of statistical interrelationships between 
differ ent c limatol ogical fields with coupled climate net- 


work J^" ' ^" ' (Sect. IIIC) as well as for detecting the di¬ 
rection of coupling between complex dynamical syste ms 
using inter-system recurrence networks^ (Sect. IV A). 


2. Use case: Zachary karate club network 

For illustrating interacting networks analysis based on 
a simple and commonly studied example, we choose the 
classical Zachary karate club social network that de¬ 
scribes friendship relationships between 34 members of 
a karate club at a US universit}!^. During the course 
of the study, a disagreement developed between some of 
the members and the club split up into two parts. Here, 
we represent the groups after fission by two subnetworks 
Gi (lead by individual 0) and G 2 (lead by individual 33) 
with internal and cross-links set according to the friend¬ 
ship ties revealed in the original study (Fig. [ 5 ]). The 
groups emerging after fission are clearly reflected in the 
social network structure as was al so fou nd using vari¬ 
ous community detection algorithm d^^ l ^^ i The cross-link 
density pi 2 ~ 0.04 is significantly smaller than the inter¬ 
nal link densities pn ~ 0.26 and P 22 ~ 0.24, underlining 

















the conceptual similarities between interacting network 
characteristics such as cross-link densities or cross-degree 
sequences and measures of modularity used for commu¬ 
nity detect ioiP^. 

Furthermore, studying local interacting network mea¬ 
sures yields insights into the roles of nodes with respect 
to interactions between the two groups. For example, 
nodes on the interface between both groups such as indi¬ 
viduals 2, 8 and 30 tend to have large cross-degree and 
cross-betweenness values compared to other nodes on the 
groups’ peripheries, because they serve as important con¬ 
nectors between the groups (Fig. [^. Focussing on the 
two group leaders, it is interesting to note that individ¬ 
ual 0 (the instructor) has a low cross-degree (just one 
cross-link to G 2 , = 1) compared to individual 33 (the 

administrator, = 3), while both leaders have compa¬ 
rable first and second largest values of cross-betweenness, 
bl‘^ ^ 148 and 633 ~ 94, respectively. This observation 
indicates that cross-betweenness is a more reliable indica¬ 
tor for leadership with respect to the groups’ interaction 
structure in this case. 


D. Node-weighted networks and node-splitting invariance 

The nodes of many real-world networks, e.g. firms, 
countries, grid cells, brain regions etc., are of heteroge¬ 
neous size, represent different shares of an underlying 
complex system, or are of different prior relevance to 
the research question at hand. As a specific example, 
in climate networks on regular latitude-longitude grids 
(see below and Fig. [^, nodes in polar regions repre¬ 
sent a significantly smaller fraction of the Earth’s surface 
than do nodes in the tropics. If this heterogeneity can 
be expressed in a vector of node weights Wy > 0 with 
V e V, a node-weighted network analysis seems appro¬ 
priate. Because many complex systems allow for network 
representations of different granularity, the results of such 
a node-weighted network analysis should be consistent 
across scaled. pyunicorn provides node-weighted vari¬ 
ants of most standard and many non-standard measures 
for networks (Network class) as well as interacting net¬ 
works (interactingNetworks class). 


C*,C* are defined as 

c* = j2wvc:/j2^v. ( 5 ) 

vEV vEV 

In contrast to their unweighted counterparts, these and 
all other n.s.i. measures have the following consistency 
property: When a node v and its weight Wy are split into 
two interlinked nodes v' ^ v" with weights Wyf -\-Wy" = Wy 
that are connected to the same nodes as v was, then all 
n.s.i. measures of nodes other than v'^ and v" remain 
unchanged (e.g. the n.s.i. clustering coefficient of some 
neighbour p oi v remains unchanged while the ordinary 
clustering coefficient of p would increase). This scale 
consistency comes at the price that in the special case 
where all weights are equal to unity, Wy = n.s.i. mea¬ 
sures do not simply reduce to their unweighted counter¬ 
parts but return slightly different values. For this reason, 
there exist also corrected n.s.i. measures that addition¬ 
ally take into account an overall typical weight u; > 0 and 
have the property that in the special case where all node 
weights equal cj, the corrected n.s.i. measure equals its 
unweighted counterpart!^ . For example, the eorreeted 
n.s.i. degree and loeal elustering eoeffieient are 

1 _ T.p,qev 

i, - K-(K--I) 

(6) 

In pyunicorn, n.s.i. measures are available in the 
Network and InteractingNetworks classes (method 
prefix nsi_), e.g.: 

nsi_arenas_betweenness, 
nsi_average_neighbors_degree, 
nsi_average_path_length, 
nsi_betweenness, 
nsi_closeness, 
nsi_degree. 

Their syntax is the same as that of their unweighted 
counterparts, and some have an additional optional key¬ 
word parameter typical_weight via which the corrected 
n.s.i. measures can be requested. To use these methods, 
one has to provide node weights before, either manually 
via the Network property node_weights, the keyword pa¬ 
rameter node_weights of the class constructor, or auto¬ 
matically via the keyword parameter node_weight_type 
of the constructor of the derived class GeoNetwork. 


/i + 
^pq 


^pq T ^pq-) 


pev 




1. Measures for node-weighted networks 


2. Use case: spatial network structures in polar region 


The t heory of node-splitting invariant (n.s.i.) network 
measurehas derived variants of many classical 
network measures that take into account node weights in 
a consistent way. For example, the n.s.i. adjaeeney ma¬ 
trix degree k^, loeal and global elustering eoeffieients 


Figure presents an application of n.s.i. degree and 
local clustering coefficient to the functional climate net¬ 
work of surface air temperature dynamics in the northern 
polar regiorl^. A ClimateNetwork object net was gen¬ 
erated as described in Sect. IIIB with nodes placed on a 
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FIG. 6. Comparison of unweighted and weighted (n.s.i.) ver¬ 
sions of degree (A,C) and local clustering coefficient (B,D) in 
the northern polar region (Lambert equal area projection) of 
a global climate network representing correlations in temper¬ 
ature dynamics. The high values at the pole in (A,B) turn 
out to be an artifact of the increasing grid density toward the 
pole, as demonstrated by (C,D). Reproduced with permission 
from Eur. Phys. J. B 85, 38 (2012) Heitzig et al)^. Copy¬ 
right 2012 European Physical Society and Springer-Verlag. 


regular latitude-longitude grid on the Earth’s surface. To 
reflect that grid’s varying node density depending on lat¬ 
itude, the node weights were set to the cosine of latitude 
by using the ClimateNetwork constructor’s keyword pa¬ 
rameter node_weight_type=" surf ace" (inherited from 
GeoNetwork, see Fig. [^). Then all nodes’ n.s.i. degree 
and local clustering coefficient were computed via 


kstarvector = net.nsi_degree() 
Cstarvector = net.nsi_local_clustering() 


and plotted using the package matplotlib. When com¬ 
paring the resulting node-weighted measures (Fig.[^,D) 
to the unweighted degree and local clustering coefficient 
(Fig.|^ ,B), one realizes that the latter measures’ high 
values around the pole (dark spots) are actually artifacts 
of the relatively higher node density, an effect that is 
compensated for in the n.s.i. measures. 


III. FUNCTIONAL NETWORKS: CONSTRUCTION 
AND ANALYSIS 


Functional networks provide a powerful generalization 
of standard methods of bi- and multivariate time se¬ 
ries analysis by allowing to study the dynamical re¬ 
lationships between subsystems of a high-dimensional 
complex system based on spatio-temporal data and us¬ 
ing the tools of network theory, pyunicorn provides 
classes for the construction and analysis of functional net¬ 
works representing the statistical interdependency struc¬ 
ture within and between sets (fields) of time series us¬ 
ing various similarity measures such as l agged Pearson 
correlation and mutual information (Sect. Ill A). Build¬ 
ing on these similarity measures, climate networks al¬ 
low for the analysis of single fields of climatological time 
series, e.g. surface air temperatu re obs erved on a grid 
covering the Earth’s surface (Sect. IIIB). Moreover, cou¬ 
pled climate networks focus on studying the interrela¬ 
tionships between two or more fields of climatological 
time series, e.g. sets of time series capturing sea surface 
temperature and atmospheric geopotential height vari- 
ability (Sect. |mC ). Functional network analysis is illus¬ 
trated drawing upon several examples from climatology, 
including the detection of spatio-temporal regime shifts 
in the atmosphere and ocean. While pyunicorn pro¬ 
vides some functionality specific to climate data (such as 
the climate.ClimateData class), the methods for gen¬ 
eral functional network analysis can also be applied to 
other sources of time series such as ^neral fluid dynam¬ 
ical and pattern formation system^, neuroscience (e.g. 
functional magnetic resonance imaging, fMRI, and elec¬ 
troencephalogram [EEC] data; Bullmore and Sporns^^ 
or finance (e.g. stock market indices; Huang, Zhuang, 
and Yao>^. 


A. Coupling analysis 


The timeseries. CouplingAnalysis class provides 
methods to estimate matrices of (optionally lagged) sta¬ 
tistical similarities S between time series including on the 
linear Pearson pro duct-moment correlation and measures 
from information theory such as mutual information and 
extensions thereof. These matrices can be thresholded 
to obtain directed or undirected adjacency matrices for 
further network analysis with pyunicorn, e .g. a s in¬ 
put to the climate .ClimateNetwork (S ect. |IIIB ) and 
climate. CoupledClimateNetwork (Sect. [HI C ) classes. 
The similarity values can also be used in link-weighted 
network measures such as those provided by the Network 
class. 


1. Similarity measures for time series 

While standard measures such as the classical linear 
Pearson product-moment correlation are only briefly dis- 
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cussed, this section focusses on more innovative measures 
based on information theory that are provided by the 
CouplingAnalysis class. The latter include bivariate 
mutual information as well as its multivariate extensions 
allowing to reduce the effects of common drivers or in¬ 
direct couplings (Fig. [^) on estimates of information 
transfer between two processes X, Y {X and Y repre¬ 
sent time series at nodes p and respectively). 

All methods share the parameters taujnax, the max¬ 
imum time lag, and lagjnode, which can be set to 
^all^ to obtain a 3-dimensional Numpy array of shape 
(N, N, taujnax + 1) containing lagged similarities be¬ 
tween all pairs of nodes, or to ^max^ to return two (N, N) 
Numpy arrays indicating the lag positions and values of 
the absolute similarity maxima. 

a. Lagged cross-correlation The lagged Pearson 
product-moment correlation coefficient (CC) of two 
zero-mean time series variables X, F, implemented in 
CouplingAnalysis . cross_correlation, is given by 


Pxy{t) 


{Xt-r,Yt) 

ax<JY 


(7) 


which depends on the covariance Yj) and standard 

deviations cry- Lags r > 0 correspond to the linear 
association of past values of X with F, and vice versa 
for r < 0. In analogy, the auto-correlation is defined as 
pyy(r) for r > 0. The choice lagjiiode=^max^ returns 
the value and lag at the absolute maximum for each or¬ 
dered pair (i, j), which can be positive or negative. CC 
is computed using the standard sample covariance esti¬ 
mator. It can be estimated for comparably small sample 
sizes. However, by definition, it does not allow to quan¬ 
tify nonlinear associations between time series and can 
produce misleading results in the presence of strongly 
nonlinear relationships. 

b. Lagged mutual information Information theor^EI 
provides a genuine framework to capture also nonlinear 
associations. While Shannon entrop^P^ is a measure of 
the uncertainty about outcomes of one process, mutual 
information (MI) is a measure of its reduction if an¬ 
other process is known. The Shannon-type MI can be 
expressed as 


/(X; F) = H{Y) - H{Y\X) = H{X) - H{X\Y), (8) 

i.e. as the difference between the uncertainty in F and 
the remaining uncertainty if X is already known (and 
vice versa). MI is symmetric in its arguments, non¬ 
negative and zero iff X and F are statistically indepen¬ 
dent. The lagged cross-MI for two time series, imple¬ 
mented in CouplingAnalysis.mutual_information, is 
given by 


j“pr) = J(y_,;Y*)- (9) 

For r > 0, one measures the information in the past 
of X that is contained in F, and vice versa for r < 0. 
Correspondingly, the auto-MI is defined as /(l^-^-; F^) for 
r > 0. 


Three different estimators are provided reflecting dif¬ 
ferent trade-offs between number of samples required, 
bias and variance of the estimator, and computational 
requirements: 

• estimator= ^ binning ^: A very simple method is to 
quantize or partition the observation space into a 
set of bins (parameter bins). Here, we use equi- 
quantile bins where the bin intervals are chosen 
such that the marginal distributions are uniforrrP^. 
While this estimator is consistent for infinite sam¬ 
ple size, for common sample sizes of the order 10^, 
many bins are not populated sufficiently resulting 
in heavily biased values of MP^. For example, for 
independent time series the estimated MI values do 
not center around zero. 

• estimator= ^knn^: A more advanced estimator for 
continuously valued variables, recommended here, 
is based on nearest-neighbor statisticP^. This esti¬ 
mator is discussed in its conditional form below. 

• estimator= ^ gauss ^: If only the linear part of an 
association is desired, assuming a bivariate Gaus¬ 
sian distribution, the MI is simply given by 

I^T%r) = -^ln{l-pxYirf), (10) 

where Pxy{t) is again the Pearson correlation co¬ 
efficient. 

c. Lagged information transfer While lagged MI can 
be used to quantify whether information in F has already 
been present in the past of X, this information could 
also stem from the common past of both processes and, 
therefore, is not necessarily a sign of a transfer of unique 
information from X to F. A first step toward a notion 
of directionality (the more demanding causality problem 
is discussed at the end of this section) is to assess a bi- 
varia te no tion of information transfer between two time 
seriepSEHl in order to exclude this common past. Here, we 
consider two measures to achieve this goal, implemented 
in CouplingAnalysis . inf ormation_transfer. These 
are based on conditional mutual information (CMI) de¬ 
fined as 


/(X; F|F) = H{Y\Z) - H{Y\X, Z) (11) 

which can be phrased as the mutual information between 
X and F that is not contained in a third, possibly multi¬ 
variate variable Z. CMI shares the properties of MI and 
is zero iff X and F are independent conditionally on Z. 

Following Runge et the bivariate information 

transfer to Y (ITY), obtainable via the parameter 
condjiiode= ^ ity b is defined as 

/yiy(r) = .. .,Yt-p). (12) 

It excludes the past of the ‘driven variable’ F up to a 
maximum lag p (parameter past). ITY can be seen as a 
lag-specific transfer entropy^. A more rigorous way to 
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exclude commonly shared past is to additionally condi¬ 
tion out the past of the ‘driver’ variable X. The bivari¬ 
ate momentary information transfer (MIT), called via 
condjiiode=^mit b can be defined as 

= 

IiXt-r;Yt\Yt-i,.. .,Yt_p,Xt-r-i ,..., (13) 

The attribute momentar'^^ is used because MIT mea¬ 
sures the information of the “moment” t — rinX that is 
transferred to Yf. MIT can also be interpreted as a mea¬ 
sure of causal strength as discussed in Runge et al^^ 
where the multivariate versions of ITY and MIT are de¬ 
fined. On the downside, it is higher-dimensional resulting 
in a larger bias for the nearest-neighbor estimator. 

Two estimators are available: 


• estimator= ^gauss ^: Like in Eq. ( 10 ), the CMI for 
multivariate Gaussians can be expressed in terms 
of the partial eorrelation^ where the Pearson corre¬ 
lation pxY is replaced by pxY\z- 


• estimator=^knnL A nearest-neighbor CMI esti¬ 
mator has been developed by Frenzel and Pompeii. 
This estimator is computed by choosing a parame¬ 
ter k (knn) as the number of neighbors in the joint 
space of (X, T, Z) around a sample at time t. The 
maximum-norm distance to the k-th nearest neigh¬ 
bor then defines a hypercube of length 2et for each 
joint sample. Then the numbers of points kz^t^ kxz,t 
and kyz^t in the subspaces with distance less than 
et are counted, and the CMI is estimated as 


/(X; YjZ) = i,{k) + 1 ^ - i’ik.zV - > 

t = l 

(14) 

where is the Digamma function and T is the 
number of samples. Smaller values of k result in 
smaller cubes and, since as assumed in the estima¬ 
tor’s derivation, the density is approximately con¬ 
stant inside these, the estimator has a low bias. 
Conversely, for large k the bias is stronger, but the 
variance is smaller. Note, however, that for inde¬ 
pendent processes the bias is approximately zero, 
i.e. /(X;T|Z)^ 0 , and large k are therefore better 
suited for (conditional) independence tests, e.g. on 
whether a link exists between two time series. 


2. Use case: coupled stochastic processes 

Consider the following simple four dimensional process 
to illustrate the different measures (Fig. 

Xi{t) = 0.8Xi(i - 1) + rii{t) 

X2{t) = 0.8X2{t - 1 ) + O.bXiit - 2 ) + 7]2{t) 

X3{t) = 0.7Xi{t - 1) + ri3{t) 

Xi{t) = 0.7Xi(t - 2) + , (15) 
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FIG. 7. (A) Process graph and (B) matrix of lag func¬ 

tions for similarity measures MI, ITY, and MIT (the latter 
two with past-history p aram eter p — 1) for a realization of 
example process (Eq. (15)). All methods implemented in 
CouplingAnalysis have a parameter lagjnode, which can be 
set to ^ all ^ to return all lagged similarities between all pairs 
of nodes (shape (N, N, taunnax + l), or to ^max^ to return two 
(N, N)-matrices of the value at the absolute maximum of each 
panel and the corresponding lags. Note that this results in 
an asymmetric matrix which can be symmetrized by taking 
the maximum of each pair (i,j) and (j, i) with the method 
CouplingAnalysis. syininetrize_by_absmax. 


where pi are independent zero-mean and unit variance 
Gaussian innovations. Here Xi ^2 are auto-correlated and 
Xi drives X 2 at a lag of 2, X 3 at a lag of I, and X 4 at a 
lag of 2 . Figure shows the lag functions for all pairs of 
variables. We illustrate in the following, how the differ¬ 
ent measures MI (similar to CC), ITY, and MIT can be 
used to reconstruct an adjacency matrix of a functional 
network. Regarding a directed link between Xi and X 2 , 
both directions Xi X 2 and X 2 Xi have non-zero 
MI values, making it hard to conclude on a direction. 
Further, the peak of the MI function in Xi X 2 is at 
lag r = 4, even though the driving lag is only 2. The am¬ 
biguity in interpreting the value of MI is discussed irP^ 
and the problem that coupling delays cannot be properly 
inferred with MI because the peak of the lag function is 
shifted for strong auto-correlations is analyzed in Runge, 
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Petoukhov, and Kurthsl^. These shortcomings can be 
overcome with ITY and MIT. Both measures feature a 
much sharper peak at the correct lag r = 2. The value 
of MIT is smaller because it also excludes the effect due 
to the auto-correlation of the driving variable. 

Still, all these bivariate lagged measures show non-zero 
values even if NO physical coupling is present in Eq. (15), 
e.g. in the lower two rows in Fig. Iz^ from X 3 and X 4 to 
the other processes. These artifacts are due to indirect 
links and common drivers (Fig. 1 !^)’ e.g. Xi driving X 3 , 
and X 4 leading to a spurious peak at Xs X 4 . MI and 
the bivariate versions of ITY and MIT discussed here are 
also not able to reliably identify the correct coupling lags 
when multiple lags are present. 


3. Discussion and extensions 

Generally, networks reconstructed from bivariate sim¬ 
ilarity measures can be used to study statistical prop¬ 
erties of associations between time series, but can¬ 
not be interpreted in a causal context. Based purely 
on observational data, a notion of a causal network 
can b e def ined within the framework of time series 
graphS^^^, which can be efficiently estimated by causal 
discovery algo rithm s in a linear framework with par¬ 
tial correlatiorP^^^ or with non-parametric information- 
theoretic estimators as implemented in the causal algo¬ 
rithm proposed in Runge et al^. Based on these causal 
graphs, multivariate versions of ITY and MIT can be 
used to quantify the links’ strength at the correct causal 
la#^. These methods are available in the software pack¬ 
age tigramite, which can be obtained from the web¬ 
site http://tocsy.pik-potsdam.de/tigramite.php as 
a complement to pyunicorn. Note, however, that reliable 
causal analyses, especially with information-theoretic es¬ 
timators, require much more samples than classical bi¬ 
variate analysis, which typically restricts their applicabil¬ 
ity to much smaller networks^. An alternative to clas¬ 
sical path-based network measures is discussed in Runge 
et Runge^^ and introduces quantifiers of informa¬ 

tion transfer through causal pathways. 


Climate networks (class climate .ClimateNetwork) 
represent strong statistical interrelationships between 
time series and are typically reconstructed by thresh¬ 
olding the matrix of a statistical similarity measure S 
(Fig. hl such as those derived from coupling analysis 
(Sect- lmAt : 


A — 

^pq — 


0 ( 5 , 
0 


pq 


■ P) if P 7^ 9, 

otherwise, 


(16) 


where ©(•) is the Heaviside function, ^ denotes a 
threshold parameter, and App = 0 is set for all 

nodes p to exclude self-loops. The threshold param¬ 
eter can be fixed following considerations of statis¬ 
tical significance given a prescribed null hypotheses 
(ClimateNetwork.set_threshold), set individually to 
Ppq for each pair of time series or chosen to achieve 
a desired link density in the resulting climate network 
(ClimateNetwork. set_link_density). 

Certain types of time series preprocessing such as cal¬ 
culation of climatological anomaly values (by subtracting 
phase averages to reduce the first-order effects of the an¬ 
nual cycle) are provided by the climate .ClimateData 
class included in ClimateNetwork. The classes de¬ 
rived from ClimateNetwork (Fig. [^) apply different 
types of similarity measures for network construction, 
e.g. TsonisClimateNetwork for linear Pearson correla¬ 
tion at zero lag or Mutual Inf oClimateNet work for non¬ 
linear mutual information at zero lag. Note that for cli¬ 
mate network analysis of large data sets with more than 
one million time series the paragraph softward^ can be 
used, offering methods and measures comparable to that 
of the TsonisClimateNetwork class. 

In the following, we present two use cases of studying 
fields of single c limatological o bservables using climate 
networks (Sects. Ill B I, III B 2) and one use case of in¬ 
vestigating the coupling structure between two cl imato - 
logical fields using coupled climate networks (Sect. Ill C). 


1. Use case: climate networks for detecting climate 
transitions 


B. Climate networks 

As a typical application of functional networks, eli- 
mate network analysis is a versatile approach for inves¬ 
tigating climatological data and can be seen as a gen¬ 
eralization and complementary method to classical tech¬ 
niques from multivariate statistics such as eigen analysis 
(e.g. empirical orthogonal function or maximum covari¬ 
ance analysis]!^. It has been already successfully used in 
a wide variety of applications, ranging from the complex 
structure of teleconnections in the cli mate systenzP^^^^^^, 
including backbones and bottleneck d^^ l ^^ *, to dynamics 
and pr edictabil ity of the El Niho-Southern Oscillation 

(ENSO)PE^. 


Climate networks (CNs) based on spatial correlations 
of time series have recently been introduced to develop 
early warning indicators for climate transitions. Two 
types of CNs have mainly been used: Pearson Corre¬ 
lation Climate Networks (PCCN), where the Pearson 
correlation (Eq. is applied to measure connectivity 
in the network, and Mutual Information Climate Net¬ 
works (MICN), where mutual information (Eq. is 
applied^. PC CNs and MICNs can be reconstructed by 
the pyunicorn classes climate. TsonisClimateNetwork 
and climate .MutualInfoClimateNetwork, respectively. 

One climate transition of crucial interest is the possi¬ 
ble collapse of the A tlantic Meridional Overturning Cir¬ 
culation as is occurring in simulations of the 

Fast Met Ofhce/UK Universities Simulator (FAMOUS) 
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Earth system Grid points / observation sites 



Functional climate network 


FIG. 8. Workflow of functional network analysis illustrated for climate networks (modified from Donges^®). In step 1, a 
discretized time series representation {xv{t)}^=i of the climatological field(s) of interest is chosen that is usually prescribed by 
the available gridded or station data. Step 2 includes time series preprocessing and the computation of similarity measures Sij 
for quantifying statistical interdependencies between pairs of climatological time series. In step 3, the construction of a climate 
network from the similarity matrix S typically involves some thresholding criterion (see Nocke et a/.^, Tominski, Donges, and 
Nocke^ for details on the climate network shown here that was visualized using the software CGV^. In step 4, the obtained 
climate network is investigated drawing on the tools of complex network theory. Finally, in step 5, the results of this analysis 
need to be interpreted in terms of the underlying dynamical Earth system. 


climate modeP^. Figure shows time series of an¬ 
nual mean Atlantic MOC strength for both the control 
simulation (black curve) and the hosing simulation (red 
curve), at the location where the maximum MOC oc¬ 
curs (at latitude 26°N and 1000 m depth). The hos¬ 
ing simulation performed using the FAMOUS model is 
a freshwater-perturbed experiment, in which the fresh¬ 
water influx over the extratropical North Atlantic is in¬ 
creased linearly from 0.0 Sv to 1.0 Sv (1 Sv = 10^ m^s“^) 
over 2000 year^. One can see that the MOC values of 


the control simulation are statistically stationary over the 
2000-year integration period, while the MOC values for 
the hosing simulation show a rapid decrease between the 
years 800 and 1050. Based on a threshold criterium it 
was found that the MOC collapses at Tc = 874 yearJ^, 
as is shown by the green line in Fig. [^. 

In Feng, Viebahn, and Dijkstral^, PCCNs were 
reconstructed from the MOC field of the FAMOUS 
model using a 100-year sliding window (with a time 
step of one year). It was found that the kur- 
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FIG. 9. (A) Time series of the annual mean MOC (in Sv, 1 Sv = 10® m^s“^) at 26°N and 1000m depth in the Atlantic for the 
control simulation (black curve) and the hosing simulation (red curve) of the FAMOUS model. The green line indicates the 
collapse time Tc = 874 years. (B) The kurtosis indicator Kd gives an early warning signal of the collapse at 738 years and lasts 
for 44 years. The blue diamond marker shows the Kd value for the window of years 591-690 as indicated by blue dashed lines 
in panel (A), and the cyan marker shows Kd for years 671-770 as indicated by cyan dashed lines in panel (A). 


tosis Kd of the climate network’s degree distribu¬ 
tion (TsonisClimateNetwork.degree_distribution) is 
a useful early warning indicator for the collapse. The 
values of Kd for the hosing simulation (red curve) and 
for the control simulation (black curve) are plotted in 
Fig. [^. For the hosing simulation, there is indeed a 
strong increase of Kd significantly exceeding the values 
for the control simulation at 738 years. The classical 
critical slowdown indicators like variance and lag-1 auto¬ 
correlation based on the same MOC data (using the same 
sliding window) do not show any early warnin g si gnal of 
the MOC transition before the collapse time rj^. 

To see why the kurtosis Kd of the degree distribution of 
PCCNs is an effective indicator for the Atlantic MOC col¬ 
lapse, we show in Fig. 10 the mean MOC fields (A,B), the 
degree fields of the PCCNs (C,D) and the degree distribu¬ 
tion (E,F) for two 100-year windows (years 591-690 and 
years 671-770) near the transition. Although the MOC 
is gradually weakening with the freshwater inflow, the 
changes in the MOC pattern are relatively minor. How¬ 
ever, the changes in the degree field are substantial and 
when the freshwater inflow is increased, high degrees in 
the network appear at nodes in the deep ocean (at about 
1000 m depth) at mid-latitudes, especially in the South 
Atlantic. The histograms of the degree fields (the degree 
distributions) for these windows (Fig.pU^,F) show a ten¬ 
dency towards high degree, which is successfully captured 
by the kurtosis Kd- 

The collapse of the Atlantic MOC has been identified 
as one of the important tipping points in the climate 
systenP^, as it will lead to a significantly reduced north¬ 
ward heat transporfPSSHnH^ With the tools of pyunicorn, 
we have provided a novel early warning indicator based 


on climate networks. The particular advantage of such 
an indicator, in contrast to the indicators based on a 
single-point time series, is that it reflects spatial correla¬ 
tions. When applied to data from the FAMOUS model, 
our results show that this kurtosis indicator Kd provides 
a strong early warning signal at least 100 years before 
the transition. 


2. Use case: seasonal and evolving climate network 
analysis of monsoon variability 

Temporal and spatial variability of climate, and thus 
climate network structure, are of increasing interest con¬ 
sidering ongoing environmental changes. Functional cli¬ 
mate networks evolving in time are a promising and use¬ 
ful tool for analyzing spatial and temporal tra nsitions in 
climate and various other climatic phenomengP^ESHinil^ 
In particular, evolving climate networks have been used 
to study seasonal and annual variability of the Indian 
Monsoon system as one of the major global climatic 
subsystems affecting life a nd prosp erity of l/4th of the 
world’s human populatiorP®^^^. On seasonal time 
scales, it is crucial to identify spatial structures of syn- 
chronicity of extreme rainfall events over the Indian mon¬ 
soon domain, as extreme rainfall events are the main 
cause of devastating floods on the subcontinent. On an¬ 
nual time scales, variability of the surface air temperature 
(SAT) field is of great interest, as it influences the total 
amount of rainfall and its spatial distribution during the 
monsoon season. 

a. Data and methodology for network eonstruetion 
In order to study seasonal extreme rainfall variabil- 
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FIG. 10. (A,B) Mean MOC stream function pattern over two 100 year windows (years 591-690 and years 671-770). (C,D) 
Degree field of the Pearson Correlation Climate Network (PCCN) constructed from the MOC data over each window using a 
threshold value of r = 0.5. (E,F) Degree frequency distribution for both cases. 
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FIG. 11. Maps of the number of outgoing undirected links (indicated by color scale) from a set of 153 reference grid points (green 
rectangles), representing synchronization of extreme rainfall events, and the mean surface wind vector (seasonal, 1998-2012), 
the latter indicated by blue arrows. The reference regions are: North Pakistan (top panels) and the Eastern Ghats (bottom 
panels). 


ity, we used observational satellite daily rainfa ll data 
for the period 1998 - 2012 (TRMM 3B42ViP2ESa 
with a spatial resolution of 0.25° ^25 km, extracted 
for the South Asian region (62.375-97.125° E, 5.125- 
39.875° N)). First, we defined time series of extreme 
rainfall events by considering daily precipitation above 
the 90th-percentile for each rainfall time series as 
extreme. Then, we constructed seasonal climate 
networks for three time periods: pre-monsoon (March- 
May), monsoon (June-September) and post-monsoon 
seasons (October-December) using event synchroniza- 
tioip™ - a measure of synchronicity of extreme 
rainfall events between a pair of geographical locations 
(climate.EventSynchronizationClimateNetwork 
class). 

For the analysis of annual SAT variability over the 
Asian monsoon domain we used d aily temperature 
anomaly data (NCEP/NCAR reanalysi^^^ for the Asian 
monsoon region 57.5-122.5° E, 2.5-42.5° N) and con¬ 
struct yearly climate networks for the period 1970- 
2010 based on Pearson correlation at zero lag using the 
climate .TsonisClimateNetwork class. We consider a 
set of 40 static networks obtained from thresholded cor¬ 
relation matrices as one time evolving temporal climate 
network of the Asian Monsoon domain. 

b. Temporal network measures For analyzing the 
annual variability of climate networks of the Asian 


monsoon domain, we use standard network mea¬ 
sures ^for quantifying changes in time evolving net- 
work^^, as described in Radebach et al^^ Tupikina 
et a/.ESH. Specifically, we calculate average path length 
C (TsonisClimateNetwork.average_path_length) and 
transitivity T (TsonisClimateNetwork.transitivity) 
for each time step in the temporal climate network. 

e. Results Analysis of seasonal networks of extreme 
rainfall events revealed two key regions. North Pakistan 
and the Eastern Ghats, which influence the distribution 
and propagation of extreme rainfall over the Indian sub¬ 
continent (Fig. [^. The Eastern Ghats region was pre¬ 
viously known by climatologists as an area influencing 
rainfall over the Indian subcontinent due to its topogra¬ 
phy, causing orographic rainfall. However, the complex 
climate network approach allows us to obtain new in¬ 
sights into the climatology of extreme rainfall events and 
to detect a previously unknown influential region: North 
Pakistan. This finding pinpoints the strong influence of 
climatological phenomena such as western disturbances 
on extreme rainfall events over the Indian subcontinent. 
It opens new possibilities to account for North Pakistan 
as a key region for inferring interactions between the In¬ 
dian Summer Monsoon system and western disturbances, 
and based on this information, to improve the forecast¬ 
ing of e xtreme rainfall events over the Indian subconti- 
nenlpS^. 
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Analysis of the annual variability of the evolving Asian 
monsoon SAT climate network allows us to conclude that 
a highly non-random, deterministic general structure is 
present in the net work on which the inter-annual vari¬ 
ability is imprinte d^^ * ^^ * ^^^ -*. The annual climate network 
variability could be explained by a dominant influence of 
the topography of the region on the climate network as 
well as regular monsoon effects, or by domin ant climatic 
events such as El Nino or La NihsPESHni] Observing 
the changes in temporal climate network properties such 
as average path length C and transitivity T allows to 
investigate this question further (Fig. I^. Most of the 
peaks of C correspond to big El Nino (EN) years, while 
troughs of T correspond to La Nina years according to 
classification of EN in Kug et al^^. This coincides well 
with results concerning the annual variability of global 
temporal climate networkJ^ and indicates the presence 
of teleconnections between El Nino and Indian Monsoon 
region. 

d. Conclusions Understanding the variability and 
evolution of the Indian monsoon and its interactions with 
ENSO remains one of the most vital questions in clima¬ 
tology. Using the pyunicorn toolbox we were able to 
analyze these phenomena and their interactions from a 
climate network perspective. Following this approach re¬ 
vealed the influence of western disturbances and wester¬ 
lies on the synchronicity, spatial structure and seasonal 
dynamics of extreme rainfall events over the Indian sub¬ 
continent and yielded insights into the annual evolution 
of temperature climate networks over the Indian mon¬ 
soon domain, and the influence of ENSO on the Indian 
monsoon system. 


C. Coupled climate networks 


Coupled climate network d^^ * ^^^ * generalize climate 
network analysis to the statistical interdependency 
structures between two or more fields of climato¬ 
logical observables as a network of interacting net¬ 
works (Fig. 1^ and, hence, provide a complemen¬ 
tary approach and generalization of classical meth¬ 
ods of eigen analysis such as maximum covariance 
analysi^^. pyunicorn provides the functionality to 
construct and analyze coupled climate networks via 
the CoupledClimateNetwork class, which inherits from 
ClimateNetwork and InteractingNetworks. In accor¬ 
dance with the n.s.i. framework (Sect. IID), weighted 
versions of all measures are also available in these classes, 
e.g. allowing for an area-weighted computation of all in¬ 
teracting network measures^. This is particularly useful 
when studying coupled climate networks that cover areas 
close to the poles, as in most cases the density of nodes 
in these areas varies strongly due to the regular gridding 
of many climate data sets (e.g. Fig. |^. 

We have applied the coupled climate networks frame¬ 
work to study ocean-atmosphere interactions in the 
Northern hemisphere based on the monthly HAD IS ST 


sea-surface temperature (SST^fnH and the 500 mbar 
geopotential h eight (HGT) fields from the ERA40 re¬ 
analysis projecfP^ for all nodes northward of 30° N lati¬ 
tude and using the linear Pearson correlation coefficient 
at zero laPl. 

Local interacting network measures allow for the de¬ 
tection of regions in one field that couple with the other 
field and additionally provide a notion of the resulting 
coupling strength and structure (Fig. 13). The n.s.i. 
cross-degree density 




~ wC 


(17) 

(18) 


measures the weighted share of nodes in another sub¬ 
network Vj that each node v £ Vi is connected with. 
It is obtained by normalizing the n.s.i. cross-degree 
kr (interactingNetworks .nsi_cross_degree) by the 
sum over all node weights in the opposite subnetwork 
Wj = Wq such that its values range between 0 and 

1. Thus, high values indicate a strong localized coupling 
between the fields or climate subnetworks. Figure 
shows the n.s.i. cross-degree density for nodes 

in the SST field (subnetwork index s). We find sev¬ 
eral localized areas in the Atlantic as well as the Pacific 
that show strong coupling with the HGT field (subnet¬ 
work index i). In contrast, the n.s.i cross-degree density 
for nodes in the HGT field shows large areas 
of pronounced coupling with the SST field (Fig. [T^). 
It should be noted, however, that this measure by defini¬ 
tion does not contain any information on the interactions 
within each of the fields. 

Additionally, the n.s.i. local cross-clustering coefficient 


Qij:¥ _ 


'^p,qeVj ^vp'^P^pq'^q^qv 

Wy 


(19) 


indicates whether two neighbors in subnetwork 
Vj of a considered node v E Vi are also mutu¬ 
ally connected and, hence, measures the weighted 
share of triangular structures between both sub¬ 
networks. It is computed using the method 
InteractingNetworks. nsi_cross_Iocal_clustering. 
We note that generally the n.s.i. cross-local clustering 
coefficient takes lower values for nodes in the SST field 
{Cy = C**, Fig. then for nodes in the HGT field 

= C^*, Fig. |13p) implying that nodes in the SST 
field couple preferably with nodes in the HGT field, 
which are themselves dynamically dissimilar and, hence, 
disconnected. 

In fact, we find that the ocean-to-atmosphere interac¬ 
tion in the Northern hemisphere follows a hierarchical 
structurd^, meaning that larger areas of dynamically 
similar nodes in the SST field couple with several dynam¬ 
ically dissimilar areas in the HGT fielcP^. This feature 
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Time (years) 



FIG. 12. Evolving average path length C (A) and transitivity T (B) network measures calculated for each yearly time step 
in the temporal climate network constructed for the period (1970-2010). Most of the peaks of C correspond to big El Nino 
(EN) years (red vertical bars), troughs of T correspond to La Nina events (blue vertical bars) where color intensity of the bar 
corresponds to EN event strength. The data has spatial resolution 2.5° x 2.5° covering the area between 2.5°S to 42.5°N and 
57.5°E to 122.5°, i.e. 468 nodes. 




EIG. 13. Goupled climate network analysis of ocean- 
atmosphere interactions northward of 30° N: N.s.i. cross¬ 
density (A) for nodes in the sea surface temperature field and 
(B) for nodes in the geopotential height field at 500 mbar. (G, 
D) The same for the n.s.i. local cross-clustering coefficient. 


in observational time serie^^^^. While time series net¬ 
works can reflect the dynamical properties of time se¬ 
ries obtained from a complex system in a smorgasbord of 
different ways, pyunicorn focusses on two complemen¬ 
tary approaches: (i) Reeurrenee network P^ ^ ‘^^ \ an ap¬ 
proach closely related to reeurrenee quantifieation analy- 
sis of reeurrenee plots^ are random geometric graphJ^^^HH 
representing proximity relationship s (link s) of state vec¬ 
tors (nodes) in phase space (Sect. IV A[ ). (ii) Visibility 
graphs encode visibility relations betwee n data points 
in the one-dimensional time domain (Sect. lEl Lacasa 
et Donner and Donges^^. Hence, while recur¬ 

rence networks allow to investigate geometric p rope rties 
of the system such as the transitivity dimensiorP^, vis¬ 
ibility graphs can be applied to investigate purely tem¬ 


poral features such as long-range correlations^^ or time- 
reversal asymmetr}®]. Network-based time series anal¬ 
ysis is demonstrated by discussing two use cases from 
paleoclimatology that aim at detecting regime shifts or 
tipping points in climate dynamics on longer time-scales. 


A. Recurrence analysis 


may be attributed to large-scale ocean currents interact¬ 
ing with different parts of the atmosphere along their 
respective directions of flow. 

IV. NETWORK-BASED TIME SERIES ANALYSIS 

Network-based methodologies provide valuable novel 
approaches to nonlinear time series analysis that have 
manifold applications ranging from studying the detailed 
geometrical structure of a dynamical system in phase 
space to detecting critical transitions or tipping points 


Recurrence is a fundamental property of many dynam¬ 
ical systems and is used by several approaches in order to 
investigate a system’s dynamics. A basic tool of non lin- 
ear time series analysis is the binary reeurrenee matrio^^ 

= ©(e-l|x(p)-x(9)||), (20) 

where x(p) is a state vector at time p = 1,..., V, N the 
number of states, ©(•) again the Heaviside function, and 
5 the recurrence threshold. A graphical representation, 
the reeurrenee plok provides visual, qualitative impres¬ 
sions about the dynamics of even high-dimensional sys¬ 
tems. Quantitative approaches based on this matrix have 
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been developed for studying different aspects of complex 
systems, particular ly b ased on univariate and multivari¬ 
ate time series datsP^. Recurrence quantification analy¬ 
sis (RQA) and recurrence network analysis (RNA) allow 
to classify different dynamical regimes in time series and 
to detect regime shifts, dynamical transi tion s or tipping 
points, among many other application^^^. Bivariate 
methods such as joint recurrence plots/networks, cross¬ 
recurrence plots or inter-system recurrence networks can 
be used to investigate the coupling structure between two 
dynamical systems based on their time series, including 
methods to detect the directionality of couplinj ^^ l ^^^ l ^^^l 
Recurrence analysis is applicable to general time series 
data from many fields such as climatology, medicine, neu¬ 
roscience or economic^^^. These applications range from 
using recurrence analysis as a classifier for m onitoring 
health states in medicine and engineering!^ to detect¬ 
ing continental-scale nonlinear regim e shi fts in the Asian 
monsoon system during the Holocend!^. 



1. Recurrence quantification analysis 

Recurrence of a dynamical system is usually stud¬ 
ied in phase space. A standard approach is to use 
time-delay embedding for reconstructing the appropri¬ 
ate phas e space representation from a measured time 
serieJI^. 

The class timeseries. RecurrencePlot can be used to 
generate recurrence plots and perform recurrence quan¬ 
tification analysis. The parameters dim and tan can 
be used to set the parameters of the time-delay em¬ 
bedding, whereas threshold or recurrence_rate as 
well as metric configure the recurrence criteria (i.e. 
used recurrence norm and threshold s). The method 
RecurrencePlot. embedding can be used to get the re¬ 
constructed phase space vectors resulting from the spec¬ 
ified embedding parameters. For example, a recurrence 
plot can be computed from a given scalar time series x{t) 
using the following code: 

rp = RecurrencePlot(x, recurrence_rate=0.05, 
dim=3, tau=30) 

The recurrence matrix R (Eq. [2Q| ) can be extracted 
as the property RecurrencePlot. R and can be plotted 
(Fig. p!4|), e.g. using matplotlib, or used for subsequent 
analysis. 

The simplest quantifier in recurrence analysis is the 
probability that any recurrence will occur, i.e. the 
fraction positive entries in R, called recurrence rate 
(RecurrencePlot. recurrence_rate). However, infor¬ 
mation about dynamical properties of the system is rep¬ 
resented by the diagonal and vertical lines apparent in 
the recurrence plot. The line length distributions are, 
thus, the foundation for statistical, quantitative analysis 
of the recurrence mat rix R (Eq. ([^)), called recurrence 
quantification analysi^^. Moreover, the empty verti¬ 
cal spaces in R, apparent as white vertical lines in the 


FIG. 14. Recurrence plot of a time serie s generated by the 
Lorenz ’63 system in a chaotic regim^!^. 


recurrence plot, correspond to recurrence times. Sev¬ 
eral measures of complexity using these line distribu¬ 
tions (diagonal, vertical, and white lines) are available 
as methods in the RecurrencePlot class, e.g. maxi¬ 
mal diagonal line length (max_diaglength), determin¬ 
ism (determinism), laminarity (laminarity), diagonal 
line entropy (diag_entropy) or mean recurrence time 
(mean_recurrence_time). The distributions of diagonal 
and vertical lines (diagline_dist and vertline_dist) 
can be useful for further quantifications, e.g. by look¬ 
ing at the sca ling behavior, which is related to the 
K 2 entrop^ff^. Resampled instances of both types of 
line distributions can be generated using the methods 
resample_diagline_dist and resample_vertline_dist 
for estimating confidence bounds for RQA measur es fo l- 
lowing the permutation-based method proposed 

pyunicorn furthermore supports multi¬ 
variate extensions of RQA such as joint 
(timeseries.JointRecurrencePlot) and cross re¬ 
currence plots (timeseries.CrossRecurrencePlot) 
that both inherit from the RecurrencePlot class. 


2. Recurrence network analysis 


The striking similarity of the binary square recurrence 
matrix (Eq. 20) with the adjacency matrix (Eq. of an 
unweighted and undirected network has lead to a com¬ 
plementary kind of recu rrence analysis by measures from 
complex network studie^^^^. More formally, the recur¬ 
rence networks (Fig. \lb^ defined in this way by their ad- 









20 



X 


FIG. 15. Recurrence network as a random geometric graph 
generated from = 750 state vectors (red filled circles) 
drawn from the i nvar iant density of the Lorenz ’63 system 
in a chaotic regimj^^. In this example, state vectors (nodes) 
with a distance smaller than e: = 2 according to the supremum 
norm are connected by a link. 


transitivity dimension 


Dr{s) 


iogr(£) 

log(3/4)’ 


(23) 


a global dimension-like measure of the geometric orga¬ 
nization of the available set of state vectors in phase space 
(RecurrenceNetwork. transitivity_dim_single_scale) 
Analogously, a transformed local clustering co¬ 
efficient yields a local dimension-like measure 
that is defined on every node or state vector 
(RecurrenceNetwork. local_clustering_dim_single_ 
scale). 

Analogously to multivariate RQA (Sect. IIVAIH , 
multivariate extensions of recurrence network analysis 
have been applied to investigate directions of coupling 
between dynamical systems^ and complex synchro niza- 
tion scenarios including generalized synchronizatiorP^. 
The corresponding methodologies are represented by the 
classes timeseries.InterSystemRecurrenceNetwork 
and timeseries.JointRecurrenceNetwork, respec¬ 
tively. 


jacency matrix 

^pq-s 

where 5pq is Kronecker’s delta introduced to avoid 
self-loops in the networks, can be understood as ran¬ 
dom geometric graphs (Fig. that capture rich in¬ 
formation on the geometrical structure of a dyn amical 
system’s invariant density in phase spacd^^^^^. The 
nodes in a recurrence network represent state vectors 
and the links indicate proximity relationships between 
them. Recurrence networks are represented by the 
timeseries .RecurrenceNetwork class inheriting from 
Network and RecurrencePlot (Fig. [^). 

This approach opens up the wealth of measures, mod¬ 
els and algorithms from complex network theory for 
time series analysis. Network measures such as average 
path length (Network.average_path_length), global 
clustering coefficient (Network. global_clustering), 
transitivity (Network.transitivity) or assortativity 
(Network.assortativity) characterize the geometrical 
properties of the dynamical system trajectories in phase 
space and can be used to different iate betw een dynamical 
regimes,e.g. periodic and chaoticP^^^^E^. In particular, 
the transitivity 

q-/ \ _ ^^v,p,q£V Avp{e)Apq{e)Avq{e) 

is appropriate because it can be lin ked to the geometry 
of the phase space trajector^EE^. Specifically, it can 
be logarithmically transformed to yield the single-scale 


3. Use case: identification of transitions in paleoclimate 
variability 

A relevant application of recurrence analysis is the de¬ 
tection of dynamical transitions in comple x syst ems cap¬ 
tured by model or observational time serie^^^^^^. Detect¬ 
ing s uch dy namical transitions, regime shifts or tipping 
pointJSmS] great interest in studying past climate 
variability to gain a deeper understanding of the Ear th’s 
climate system also on geological time scale^^^^^^^. In 
the following we discuss a typical example fro m p aleo- 
climate research following Marwan and Kurths^^^ that 
focusses on investigating interactions between sea-surface 
temperature (SST) and the dynamics of specific climate 
subsystems, such as the Asian monsoon system or the At¬ 
lantic thermohaline circulation, as well as regime shifts 
therein. 

Diverse types of geological archives are used in paleocli- 
matology to reconstr uct and study climate co nditions of 
the past, such as lakd^ ^ and marine sediment d^^^ l ^^^ l ^^ 
or speleothem j ^ F Alkenone remnants in the or¬ 
ganic fraction of marine sediments, produced by phyto¬ 
plankton, can be used to reconstruct SSTs of the past 
(alkenone paleothermometry), which all ows to investi¬ 
gate past oceanic temperature variabilit}®^^^^. In this 
use case, we analyze an SST reconstruction for the South 
China Sea covering the past 3 Ma that is derived from 
alkenone paleothermometry of the Oce an Drilling Pro¬ 
gramme (ODP) site 1143 drill cor^SH (Fig. [T^). The 
South China Sea is strongly connected to the East Asian 
Monsoon (EAM) system encompassing a winter monsoon 
season with strong winds and a summer monsoon season 
with particularly high precipitation. 

We generate RecurrenceNetwork objects 
and compute the measures determinism DET 
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A 


from loess sediments that the EAM summer monsoon 




weakened between 2.0 and 1.5 Ma BP and around 0.7 Ma 
BP. During these periods, DET and T assume lower val¬ 
ues and, hence, more irregular SST variability, than dur¬ 
ing the previously discussed epochs. 

In this way, recurrence analysis by means of the mea¬ 
sures DET and T confirms earlier findings of strong links 
between the EAM and Milankovich cycles. Eurthermore, 
the analysis of recurrence properties allows for deeper 
insights, such as that dominant Milankovich cycles and 
periods of major climate transitions from one to another 
regime go along with increased and reduced regularity in 
the (regional) climate dynamics in the East Asian Mon¬ 
soon system (as reflected by the South China Sea SST 
and for the considered time scales). 


FIG. 16. Studying dynamical transitions in paleoclimate data 
using recurrence analysis: (A) Alkenone paleothermometry 
based SST estimates for the South China Sea and (B) corre¬ 
sponding determinism DET and transitivity T. 


(RecurrenceNetwork. determinism) and transitiv¬ 
ity T (RecurrenceNetwork.transitivity) for sliding 
windows of length 410 ka (containing a varying number 
of data points due to the time series’ irregular sampling) 
and a step size of 20 ka (Eig. [T^). Eor reco nstructing 
the phase space by time-delay embeddin^^, we select 
an embedding dimension o f 6 ( as suggested by the false 
nearest neighbors criteriorP^ . The selection of the 
time-delay parameter is guided by the auto-correlation 
function. As a result, it is approximated as 20 ka for 
all time windows based on the median sampling time 
within each window. The recurrence threshold is chose n 
to preserve a prescribed recurrence rate of 

During the past 3 Ma, several major and many smaller 
climate changes occurred on regional, but also global 
scales. Particularly pronounc ed clim ate shifts have been 
related to Milankovich cycle^^H^^ and major changes 
in ocean circulation pattern^^^. Eollowing a transition 
towards obliquity-driven climate variability with a 41 ka 
period around 3.0Ma BP (before present), a long period 
of globally warm climate ended and Nor thern hemi sphere 
glaciations started after 2.8-2.7Ma This 

transition is revealed by a significant increase of DET 
and T between 2.8 and 2.2 Ma signifying an increased 
regularity of SST dynamics during this period. Erom de¬ 
tailed studies of loess sediments it is known that around 
1.25 Ma BP, the intensity of the EAM winter monsoon 
season sta rted to be strongly coupled to global ice-volume 
chang^SH, Around this time also marking the beginning 
of a transition phase towards glacial-interglacial cycles of 
100 ka period (eccentricity-dominated period of the Mi¬ 
lankovich cycles), DET and T increase markedly again. 
Dominance of the 100 ka period was well established after 
0.6 Ma and is clearly indicated by incr eased DET and T 
values between 0.6 and 0.2 Ma BpEm 

It is also known 


B. Visibility graphs 

Visibility graph (VG) methods represent an alterna¬ 
tive approach for transforming time series into complex 
network^^, which draws upon analogies between height 
profiles in physical space and the profile of a time series 
graph. Originally utilized in fields like architecture and 
robot motion planning, VGs are based on the existence 
or non-existence of lines of sight between well-defined ob¬ 
jects. 


1. Theory of time series visibility graphs 


In a time series context, these objects are the sam¬ 
pling points of a (univariate) time series graph, which are 
uniquely characterized by pairs (ty^Xy) with Xy = x{ty). 
Erom a practical perspective, we can identify each node 
of a standard visibility graph with a given time point 
ty. Eor ty < tp (and, hence, v < p) a. link between the 
nodes v and p exists iff 


Xq<Xy + - tv) ^ V < q < p. (24) 

ip ty 

Put differently, the topological properties of VGs are 
closely related to the roughness of the underlying time 
series profile. 

As a notable algorithmic variant, horizontal visibility 
graphs (HV Gs) facilitate analytical investigations of the 
graph profile^Sl. In this case, Eq. (24) is replaced by the 
simpler condition 


Xq < mm{xy,Xp} V V < q < p. (25) 

One easily convinces oneself that the latter condition is 
more restrictive, and that the link set of a HVG is a 
subset of that of the standard VG. In turn, the classical 
VG is invariant under arbitrary affine transformations 
of a time series, whereas the HVG retains this invariance 
only for uniform translations and rescaling of the original 
data. 
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In pyunicorn, VG and HVG can be gen¬ 
erated from any time series object via the 
timeseries.VisibilityGraph class. The decision 
on which of the two variants is used is mediated via 
the Boolean parameter horizontal, which can be set 
to False (standard VG, default value) or True (HVG). 
Additional information on the timings of individual 
observations as well as missing values can be provided 
via supplementary optional parameters. 

The distinctively different construction mechanism 
of ( H)VGs in comparison to recurrence networks 
(Sect. IVA2) implies a completely different interpreta¬ 
tion of the resulting networks. First, recurrence networks 
are equally applicable to uni- and multivariate time se¬ 
ries, whereas VGs are tailored for applications to univari¬ 
ate records. So far, there is no way to unambiguously 
generalize the VG concept to bi- or multivariate time se¬ 
ries. 

Second, both typ es of time series networks are spa¬ 
tial networks (Sect. liiBl ) in an abstract sense - recur¬ 
rence networks’ nodes being characterized by the posi¬ 
tions of the associated state vectors in the dynamical 
system’s phase space, and those of VGs being fixed along 
the time axis representing some abstract one-dimensional 
space. This fact implies strong restrictions to the result¬ 
ing graph propertied, including potentially severe biases 
of (especially path-based) network properties due to the 
absence of information prior to the first, as well as af¬ 
ter the last sampling point of the time series. To this 
end, simplified functionalities correcting for such effects 
have been implemented for the case of degree and close¬ 
ness centrality measures (boundary_corrected_degree 
and boundary_corrected_closeness, respectively). An 
improved treatment is planned for future releases of 
pyunicorn, but still requires further analytical under¬ 
standing of the VG properties. 

Third, while power-law degree distributions of recur¬ 
rence networks are related with spe cific singularities of 
the probability density of stated^, they are a widely 
observable feature of VG s that arises from the presence 
of long-range correlationd^^^^. Specifically, under the 
assumption of a fractional Brownian motion or related 
stochastic process, the characteristic scaling exponent 
of the degree distribution of a VG can be directly re¬ 
lated with the Hurst exponent of the series, whereas 
there ar e no similar unique relationships for recurrence 
networkHowever, given the multiplicity of existing 
estimators of the latter quantity, the practical advantage 
of using VGs and HVGs for this purpose has not yet been 
convincingly demonstrated. 

Finally, given the direct association between nodes 
and time points, VGs and HVGs provide a means to 
discriminate the statistical properties of a graph when 
looking forward and backward in time, respectively, 
from a given observation point. This gives bi rth to 
a new class of time series irreversibility test j^^ * ^^ ^ 
which allow to evaluate the null hypothesis of linearity 
since irreversibility is a common hallmark of nonlinear 


dynamic^^^. Specifically, local (node-wise) graph 
properties can be decomposed into contributions from 
either past or future observations {retarded (backward) 
vs. advaneed (forward)), as well as such combining in¬ 
formation from past and future {trans). Corresponding 
features are implemented in pyunicorn for a variety of 
network properties, including degree (retarded_degree 
vs. advanced_degree), local clustering coefficient 
(retarded/advanced_local_cluster ing), closeness 
(retarded/advanced_closeness) and betweenness 
(retarded/advanced/trans_betweenness). 


2. Use case: time irreversibility in glacial-deglacial 
dynamics 


VG-based tests for time series irreversibility have been 
successfully applied to discriminate between different dy¬ 
namical regimes in EEG recordings from healthy and 
epileptic patient^^. It is instructive to also apply this 
metho dolo gy to study regime shifts in paleoclimate dy¬ 
namical^, particularly for investigating in depth the im¬ 
portant differences between interglacial and glacial cli¬ 
mate variability and the nature of the transitions be¬ 
tween those regimeAs an example, we study the 2 m 
resolution iso tope record from the GISP2 ice core 

from GreenlancP^ convering both the Holocene and the 
last glacial period (Eig. 17). The boundary between the 
last glacial and the Holocene is defined by the You nger 
Dryas-Preboreal transition at about 11,650 years BP^^. 

We apply the test for time irreversibility proposed 
in Donges, Donner, and Kurths^^, which is based 
on comparing the distributions of time-direeted degree 
(VisibilityGraph.retarded/advanced_degree) 


K= Y. ( 26 ) 

pEV :p<v 

Ayp (27) 

pEV :p>v 

and time-direeted loeal eluster- 

ing eoeffieient (VisibilityGraph. 

retarded/advanced_local_clustering) 


' E (28) 

^ ^ P,qEV:p,q<v 

Cv = {^;) E (29) 

^ '' p,q£V:p,q>v 


using the Kolmogorov-Smirnov test. To ensure the ro¬ 
bustness of the results, a /c-leave-out cross-validation is 
conducted by generating ensembles of 100 realizations 
for each record by bootstrapping 80% of the data points 
without changing their time ordering and, subsequently, 
performing the test to each ensemble member. Eollowing 
this protocol for the Holocene and the last glacial pe¬ 
riod time series segments separately, we find that the 
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Time (ka BP) 

FIG. 17. Visibility graph of the GISP2 ice core record (2m resolution) from Greenland during the last glaciaP^. Black 
dots indicate observations (nodes). Links are added between mutually visible observations (gray lines). 


TABLE 11. Results of the visibility graph-based tests for time 
irreversibility of the GISP2 ice core record from Green- 
lan(PSl. The number of data points in each time series seg¬ 
ment is denoted by M, while qu and qc give the rate of re¬ 
jection for the null hypothesis of time reversibility (at a 95% 
significance level) during a /c-leave-out cross-validation for de¬ 
gree and clustering-based tests, respectively. 


Period Time span (ka BP) M qu qc 

Last glacial 110.98 - 11.65 566 0.93 0.82 

Holocene 11.65 - today 824 0.00 0.00 


null hypothesis of reversibility can be quite safely re¬ 
jected for the last glacial, whereas this is not the case 
for the Holocene (Tab. 0 - Since the conditions for the 
deposition of ice and snow at the drilling site are ar¬ 
gued to have remained more or less constant through¬ 
out the entire time span covered by the record, these 
results point at a strong signature of nonlinear climate 
dynamics during the last glacial, which is not detected 
for the Holocene, consistently with results for Holocene 
speleothem record^. One visually directly accessible 
piece of evidence for the indicated irreversible dynam¬ 
ics during the last glacial period are the frequently oc¬ 
curring Dansgaard-Oeschger events that are character¬ 
ized by rapid warming (change towards more positive 
values) and subsequent slower cooling (change to¬ 
wards more negative values) of Greenland climate 

(Fig. pr] ). Note that the sampling times increase with 
age for the GISP2 as for all ice core proxy records (in 
GISP2, the average sampling time is 14.2 ± 6.3 a for the 
Holocene, while it is 175.8±119.6a during the last glacial 
period), because the ice column is compressed by ice and 
snow deposited on top of it over the years. It remains to 
be tested in future research whether this nonstationarity 
has a strong impact on the results of the VG test for time 
irreversibility. 


V. SURROGATE TIME SERIES 


As mentioned in several contexts above and analo¬ 
gously to random network models (see, e.g. Sect. 


IIB 


on random surrogate models for spatial networks and 
Sect. |II C| on surrogate models for networks of net¬ 
works), surrogate time series are a useful methodol¬ 
ogy for testing the statistical significance of observed 
time series properties such as those derived from func¬ 
tional networks or network-based tim e series analysis 
based on various null hypotheseJ^^. The idea be¬ 
hind this approach is to generate surrogate time series 
that conserve certain properties of observed time series 
such as the amplitude distribution or auto-correlation 
function, but are random otherwise, pyunicorn can 
be used to generate several commonly used types of 
time series surrogates (timeseries.Surrogates class): 
white noise surrogates^ Fourier surro gate s, amplitude ad- 
juste d F ourier transform surrogates'^ or twin surro¬ 
gate^^. White noise surrogates correspond to ran¬ 
domly shuffled copies of a time series x{t) and, hence, 
retain only the amplitude distribution of the original 
data (Surrogates .white_noise_surrogates). In con¬ 
trast, Fourier surrogates are generated by randomiz¬ 
ing the phase components of x{t) in Fourier space and 
conserve the time series’ power spectrum (and thus 
also its linear auto-correlation function via the Wiener- 
Chintschin theorem), but not the amplitude distribution 
(Surrogates. correlated_noise_surrogates). This 
potential drawback is addressed by amplitude adjusted 
Fourier transform (AAFT) surrogates that only approx¬ 
imate the power spectrum but conserve the amplitude 
distribution (Surrogates .AAFT_surrogates). An even 
closer match is provided by iteratively refined AAFT sur¬ 
rogates (Surrogates. ref ined_AAFT_surrogates). Fi¬ 
nally, twin surrogates approximate the recurrence struc¬ 
ture of the original time series and, hence, have the po¬ 
tential to additionally conserve certain nonlinear proper¬ 
ties of x(t) such as the auto-mutual information function 
(Surrogates .twin_surrogates). Note that the surro- 
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gate types supported by pyunicorn only conserve cer¬ 
tain properties of sin gle t ime series, whereas multivari¬ 
ate surrogate methods^^ conserving properties of pairs 
of time series such as the linear cross-correlation function 
are currently not implemented. 

When computing functional networks, surrogate time 
series can be used to include links in the network based 
on a fixed significance level instead of a fixed threshold 
or link density in terms of the considered similarity mea - 
sure from coupling analysis (see, e.g. Pains et 
In network-based time series analysis, surrogate time se¬ 
ries provide a means to test the statistical significance 
of results obtained for the data at hand based on a hi¬ 
erarchy of null hypotheses, for example, when se arching 
for dynamical transitions and regime shiftor when 
studying the direct ionalit y of coupling or synchronization 
between time serie^^^^^^. 


VI. CONCLUSIONS AND PERSPECTIVES 

In this article, we have described the pyunicorn 
software package, which facilitates the study of var¬ 
ious types of complex networks as well as a detailed 
investigation of time series data using modern methods 
of functional network and network-based nonlinear 
time series analysis. pyunicorn is written in the 
programming language Python and, hence, is con¬ 
veniently applicable to research domains in science 
and society as different as neuroscience, infrastructure 
and climatology. Most computationally demanding 
algorithms are implemented in fast compiled languages 
on sparse data structures, allowing the performant 
analysis of large networks and time series data sets. 
The software’s modular and object-oriented architecture 
enables the flexible and parsimonious combination of 
data structures, methods and algorithms from differ¬ 
ent fields. For example, combining complex network 
theory (core. Network class) and recurrence plots 
(timeseries .RecurrencePlot class) yields recurrence 
network analysis (timeseries. RecurrenceNetwork 
class), a versatile framework that opens new perspec¬ 
tives for nonlinear time series analysis and the study of 
complex dynamical systems in phase space. Another ex¬ 
ample are climate networks (climate. ClimateNetwork 
class), an approach bringing together ideas and concepts 
from complex network theory and classical eigen analysis 
of climatological data (e.g. empirical orthogonal function 
analysis]!^. 

Along these lines, pyunicorn has the potential to facil¬ 
itate future methodological developments in the fields of 
network theory, time series analysis and complex systems 
science by synthesizing existing elements and by adding 
new methods and classes that interact with or build upon 
preexisting ones. Nonetheless, we urge users of the soft¬ 
ware to ensure that such developments are theoretically 
well-founded and explicable as well as motivated by well- 
posed and relevant research questions to produce high- 


quality research. 
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